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Abstract. We introduce an analytical model to study the evolution towards equilibrium in spatial games, 
with ‘memory-aware’ agents, i.e., agents that accumulate their payoff over time. In particular, we focus our 
attention on the spatial Prisoner’s Dilemma, as it constitutes an emblematic example of a game whose Nash 
equilibrium is defection. Previous investigations showed that, under opportune conditions, it is possible 
to reach, in the evolutionary Prisoner’s Dilemma, an equilibrium of cooperation. Notably, it seems that 
mechanisms like motion may lead a population to become cooperative. In the proposed model, we map 
agents to particles of a gas so that, on varying the system temperature, they randomly move. In doing 
so, we are able to identify a relation between the temperature and the final equilibrium of the population, 
explaining how it is possible to break the classical Nash equilibrium in the spatial Prisoner’s Dilemma 
when considering agents able to increase their payoff over time. Moreover, we introduce a formalism to 
study order-disorder phase transitions in these dynamics. As result, we highlight that the proposed model 
allows to explain analytically how a population, whose interactions are based on the Prisoner’s Dilemma, 
can reach an equilibrium far from the expected one; opening also the way to dehne a direct link between 
evolutionary game theory and statistical physics. 

PACS. 89.20.-a Complex Systems - 87.23.Cc Population dynamics and ecological pattern formation - 
05.90.+m Other topics in statistical physics, thermodynamics, and nonlinear dynamical systems 


1 Introduction 

Evolutionary games [TiEin^ represent the attempt to study 
the evolution of populations 13115116] by the framework of 
game theory [7]. Notably, these games allow to analyze 
simplified scenarios in different domains, spanning from 
socio-economic dynamics to biological systems [siimmiToi 
nmmaiiiiisiiTiini. In general, evolutionary games con¬ 
sider a population of agents whose interactions are based 
on games like the Prisoner’s Dilemma (hereinafter PD) 
or the Hawk-Dove game [4], where there are two possi¬ 
ble strategies: cooperation and defection. As in classical 
game theory, the concept of equilibrium represents a core 
aspect nHj. Therefore, we aim to evaluate if a popula¬ 
tion reaches an equilibrium equal or different from the 
expected one, i.e., the Nash equilibrium of the considered 
game. At each interaction, agents gain a payoff according 
to the adopted strategy and to a payoff matrix. The payoff 
represents a form of reward in the considered domain (e.g., 
money in an economic system or food in an ecosystem). 
Remarkably, as agents are allowed to change their strategy 
over time, we can map them to spins with states a = ±1, 
representing cooperation and defection, respectively. In 
doing so, we can analyze order-disorder transitions in the 
spatial PD. Previous studies [T^I2nil2lU22ll^[^EH] have 


shown that, under particular conditions, it is possible that 
a population playing a game like the PD, i.e., a game char¬ 
acterized by defection as Nash equilibrium, can be able 
to reach a final state of full cooperation. For instance, it 
seems that both motion and competitive¬ 

ness [24] can lead an agent population to cooperate [26] 
and, more in general, spatial structure plays a key role 
in the evolution of cooperation [T71I28] . Usually, adding 
properties to agents, as motion, conformity and competi¬ 
tiveness, entails to increase the complexity of the resulting 
model. Thus, most investigations on evolutionary games 
are based on computational approaches. Therefore, in this 
work we try to provide an analytical description of the 
spatial PD, in order to explain how a population can 
become cooperative and to strengthen the link between 
evolutionary game theory m and statistical physics ED]. 
It is worth to highlight that we consider ‘memory-aware’ 
agents, i.e., agents that accumulate their payoff over time. 
Remarkably, this last condition represents the major dif¬ 
ference with most of the evolutionary game models studied 
by computational approaches (see for instance mm)- 
On the other hand, considering ‘memory-aware’ agents 
makes the problem more tractable from an analytical per¬ 
spective. The remainder of the paper is organized as fol¬ 
lows: Section [^introduces the proposed model and its an- 
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alytical formulation. Section shows analytical results. 
Eventually, Section ends the paper. 


2 Model 

In the proposed model, we are interested in studying the 
spatial prisoner’s dilemma by an analytical approach. Let 
us start by introducing the general form of a payoff matrix 

S(? I) (') 

where the set of strategies is E = {C^D}: C stands for 
‘Cooperator’ and D for ‘Defector’. In the matrix R is 
the gain obtained by two interacting cooperators, T rep¬ 
resents the Temptation^ i.e., the payoff that an agent gains 
if it defects while its opponent cooperates, S the Sucker 
payoffs i.e., the gain achieved by a cooperator while the 
opponent defects, eventually P the payoff of two interact¬ 
ing defectors. In the case of the PD, matrix elements of[T] 
are: i?=l,0<5'<—1,1<T<2 and P = 0. As 
stated before, during the evolution of the system agents 
can change their strategy from C to P, and vice versa, 
following an updating rule, as for instance the one named 
‘imitation of the best’ (see mM), where agents imitate 
the strategy of their richest neighbor. 


2.1 Mean field approach 

Now, we consider a mixed population of N agents with, 
at the beginning, an equal density of cooperators and de¬ 
fectors. Under the hypothesis that all agents interact to¬ 
gether, at each time step the payoffs gained by cooperators 
and defectors are computed as follows 

Wc = {pc-N -1) + {pd ■ N)S 

TTd = {Pc ■N)T ^ ’ 

with pc P Pd = I5 pc density of cooperators and pd den¬ 
sity of defectors. We recall that defection is the dominant 
strategy in the PD and, even if we set P = 0 and T = 1, 
it corresponds to the final equilibrium because iTd is al¬ 
ways greater than tTc- At this point, it is important to 
highlight that previous investigations p!9l[2Qll^ have been 
performed by ‘memoryless’ agents (i.e., agents that do not 
accumulate the payoff over time) whose interactions were 
defined only with their neighbors, and focusing only on 
one agent (and on its neighbors) at a time. These condi¬ 
tions are fundamental. For instance, if at each time step we 
randomly select one agent interacting only with its neigh¬ 
bors, there exists the probability to select consecutively 
a number of close cooperators; thus, in this occurrence, 
very rich cooperators may emerge and then prevail on de¬ 
fectors, even without introducing mechanisms like motion. 
It is also worth to observe that as P = 0, a homogeneous 
population of defectors does not increase its overall payoff. 


Instead, according to the matrix a cooperative popula¬ 
tion continuously increases its payoff over time. 

Now, we consider a population divided into two groups 
by a wall: a group composed of cooperators, and a 
mixed group G^, i.e., composed of cooperators and defec¬ 
tors in equal amount. Agents interact only with members 
of the same group, then the group never changes and, 
in addition, it strongly increases its payoff over time. The 
opposite occurs in the group G^, as it converges to an 
ordered phase of defection, limiting its final payoff. Re¬ 
markably, in this scenario, we can introduce a strategy 
to modify the equilibria of the two groups. In particular, 
we can both change to cooperation the equilibrium of G^, 
and to defection that of G^. In the first case, we have to 
wait a while, before moving one or few cooperators to G^, 
so that defectors increase their payoff, but during the re¬ 
vision phase they change strategy to cooperation as the 
newcomers are richer than them. In the second case, if 
we move after few time steps a small group of defectors 
from G^ to G^, the latter converges to a final defection 
phase. These preliminary and theoretical observations let 
emerge an important property of the ‘memory-aware’ PD: 
considering the two different groups, cooperators may suc¬ 
ceed when act after a long time and individually. Instead, 
defectors may succeed acting fast and in group. Notably, 
rich cooperators have to move individually since otherwise 
many rich cooperators risk to increase too much the pay¬ 
off of defectors that, in this case, will not change strategy. 
The opposite holds for defectors that, acting in group, may 
strongly reduce the payoff of a community of cooperators 
(for S' < 0). 


2.1.1 Mapping agents to gas particles 

We hypothesize that the spatial PD, with moving agents, 
can be successfully studied by the framework of kinetic 
theory m- Therefore, in the proposed model, we map 
agents to particles of a gas. In doing so, the average speed 

of particles is computed 8iS < v >= ^ ^i^h Tg sys- 

Y 

tern temperature, Boltzmann constant, and rup particle 
mass. Particles are divided into two groups by a permeable 
wall, so that it can be crossed by particles, but it avoids 
interactions among particles belonging to different groups. 
Now, it is worth to emphasize that we can provide a dual 
description of our system: one in the ‘physical’ domain of 
particles, the other in the ‘information’ domain of agents. 
Notably, to analyze the system in the ‘information’ do¬ 
main we will introduce, as above discussed, the mapping 
of agents to a spin system (see [33]). Summarizing, we map 
agents to gas particles in order to represent their ‘physi¬ 
cal’ property of motion, and we map agents to spins for 
representing their ‘information’ property (i.e., their strat¬ 
egy). Remarkably, these two mappings can be viewed as 
two different layers for studying how the agent population 
evolves over time. Although the physical property (i.e., 
the motion) affects the agent strategy (i.e., its spin), the 
equilibrium can be reached in both layers/domains inde¬ 
pendently. This last observation is important since we are 
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interested in evaluating only the final equilibrium reached 
in the ’information’ domain. Then, as stated before, agents 
interact only with those belonging to the same group, so 
the evolution of the mixed group can be described by 
following equations 

f = Pci^) ■ • Pdit) - Pdit) ■ Pdit) ■ PcW 

] = Pdit) ■ Pdit) ■ Pcit) - Pcit) ■ Pc{t) ■ p\{t) (3) 

hcW + PdW = 1 

with probability that cooperators prevail on defec¬ 
tors (at time t), and probability that defectors pre¬ 
vail on cooperators (at time t). These probabilities are 
computed according to the payoffs obtained, at each time 
step, by cooperators and defectors 

{r)b(t) = __ 

\Pcy) 

\Pdit) = 1 -PcW 

The systemcan be analytically solved provided that, at 
each time step, values of p^it) and p^{t) be updated. So, 
the density of cooperators reads 


Pcit) = 


PciO) 


pm - [(pm - 1 ) ■ 


(5) 


with Pc( 0 ) initial density of cooperators in G^, r = p^(t) — 
Pc{t)^ and number of agents in G^. Recall that setting 
Tg = 0 , not allowed in a thermodynamic system, corre¬ 
sponds to a motionless case, leading to the Nash equi¬ 
librium in G^. Instead, for > 0 we can find more in¬ 
teresting scenarios. Now we suppose that, at time t = 0, 
particles of G^ are much closer to the wall than those of 
G^ (later we will relax this constraint); for instance, let us 
consider a particle of G^ that, during its random motion, it 
is following a trajectory of length d (in the n-dimensional 
physical space) towards the wall. Assuming this particle 
is moving with speed equal to < 'z; >, we can compute 
the instant of crossing tc = i-e., the instant when it 

moves from G^ to G^. Thus, on varying the temperature 
Tg, we can vary tc- 

Let us consider the payoff of cooperators in the two groups. 
Each cooperator in G^ gains 


7r“ = (py7V“-l)-t (6) 

On the other hand, the situation for cooperators in G^ 
is much more different as, according to the Nash equi¬ 
librium, their amount decreases over time. Therefore, we 
can consider how changes the payoff of the last cooperator 
survived in G^ 


= Y}(p'c • - 1 )+ (p\ • ms]i ( 7 ) 

i=0 


3 Results 

The analytical solution allows to analyze the evolution 
of the system and to evaluate how initial conditions affects 
the outcomes of the model. Let us observe that, if 7r^{tc) 
is enough big, the new cooperator may modify the equi¬ 
librium of G^, turning defectors to cooperators. Notably, 
the payoff considered to compute after tc, corresponds 
to 7r^(tc), as the newcomer is the richest cooperator in G^. 
Furthermore, we note that 7r^(tc) depends on hence 
we study the evolution of the system on varying the pa¬ 
rameter e = ^, i.e., the ratio between particles in the 
two groups. Eventually, for numerical convenience, we set 
ki) = 1 ' 10 “^, rrip = 1 , and d = 1. 

Figure shows the evolution of G^, for e = 1 on varying 
Tg and, depicted in the inner insets, the variation of sys¬ 
tem magnetization over time (always inside G^) computed 



with ai strategy of the i-agent. As discussed before, in 
the physical domain of particles, heating the system en¬ 
tails the average speed of particles increases. Thus, under 
the assumption that two agents play together if they stay 
close (i.e., in the same group) for a long enough time, 
we hypothesize that exists a maximum speed such that 
for greater values interactions do not occur (in terms of 
game). This hypothesis requires a critical temperature Tc, 
above which no interactions, in the ‘information’ domain, 
are possible. As shown in plot f of figure for temper¬ 
atures in range 0 < Tg < T^ax the system converges to 
a cooperation phase (i.e., M = +1), for Tjnax < Ts < Tc 
the system follows the Nash equilibrium (i.e., M = — 1), 
and for T > Tc a disordered phase emerges at equilibrium. 
Remarkably, results of our model suggest that it is always 
possible to compute a range of temperatures to obtain an 
equilibrium of full cooperation —see figure Moreover, 
we study the variation of Tmax on varying e (see figure 
showing that, even for low e, it is possible to obtain a time 
tc that allows the system to converge towards coopera¬ 
tion. Eventually, we investigate the relation between the 
maximum value of Tg that allows a population to become 
cooperative and its size N (i.e., the number of agents). 
Remarkably, as shown in figure the maximum Tg scales 
with N following a power-law function characterized by a 
scaling parameter (i.e., an exponent) 7 ^ 2 . The value of 
7 has been computed by considering values of Tg shown 
in figure for the case e = 2 . Eventually, it is worth to 
highlight that all analytical results let emerge a link be¬ 
tween the system temperature and its final equilibrium. 
Recalling that we are not considering the equilibrium of 
the gas, i.e., it does not thermalize in the proposed model, 
we emphasize that the equilibrium is considered only in 
the information domain. 


3.1 Phase Transitions in the spatial PD 


moreover, tt^ ^ 0 as ^ 0. At t = tc, a new cooperator As discussed before, in the information domain we can 
reaches G^, with a payoff computed with equation study the system by mapping agents to spins, whose value 
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Fig. 1. From a to e: Evolution of the group with N = 100 and e = 1, on varying the temperature: a. Tg =0. b. Tg = 0.1. 
c. Tg = 9. d. Tg = 15. e. Tg = 50. Insets show the system magnetization over time. The istant t = tc, can be detected in plots 
c,d,e as a discontinuity of the two lines (i.e., red and black), f. Final magnetization M, of for different temperatures (Tc 
indicates the ‘critical temperature’). 



Fig. 2. Maximum values of temperature Tg that allow the 
group to converge to cooperation. Red values correspond 
to results computed with e = 0.5, while blue values to those 
computed with e = 1. Circles are placed in the TS diagram 
indicating values of T and S, of the payoff matrix, used for 
each case. Even for high values of T, and small values of S', it 
is possible to achieve cooperation. 


represents their strategy. In addition, we can map the dif¬ 
ference between winning probabilities, of cooperators and 
defectors, to an external magnetic field: h = In 



Fig. 3. Maximum value of system temperature that allows 
to achieve cooperation at equilibrium versus e (i.e., the ratio 
between particles in the two groups). Different colors identify 
different trends, fitted by power-law functions. After the final 
green plateau, temperatures are too high to play the spatial 
PD. 


doing so, by the Landau theory m, we can analytically 
identify an order-disorder phase transition. Notably, we 
analyze the free energy F of the spin system on varying 
the control parameter m [35] (corresponding to the mag- 
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Fig. 4. Maximum value of Tg to achieve full cooperation at 
equilibrium in function of N, i.e., the size of the population. 
The fitting function (dotted line) is a power-law characterized 
by a scaling parameter equal to 2. 

net iz at ion M) 

2 4 

F(m) = —hm ±-^- (9) 

where the sign of the second term depends on the temper¬ 
ature, i.e., positive for Tg > Tc and negative for Tg < Tc] 
recalling that Tc represents the temperature beyond which 
it is not possible to play the PD due to the high particles 
speed (according to the condition before discussed). For 
the sake of clarity, we want to emphasize that the free en¬ 
ergy is introduced in order to evaluate the nature of the 
final equilibrium achieved by the system. In particular, 
looking for the minima of F allows to investigate if our 
population reaches the Nash equilibrium, or different con¬ 
figurations (e.g., full cooperation). Figureshows a pic¬ 
torial representation of the phase transitions that occur in 
our system, on varying Tg and the external field h. Finally, 
the constraints related to the average speed of particles, 
and to the distance between each group and the perme¬ 
able wall, can in principle be relaxed as we can imagine 
to extend this description to a wider system with several 
groups (as done in previous investigations, e.g. EOl), where 
agents are uniformly spread in the whole space. It is worth 
to highlight that our results are completely in agreement 
with those achieved by authors who studied the role of 
motion in the PD (as [HUO]), explaining why clusters 
of cooperators emerge in their simulations [20]. We also 
recall that, in the proposed model, we are using memory- 
aware agents, while in previous computational investiga¬ 
tions agents reset their payoff at each step, i.e., before to 
start new interactions. 


4 Conclusions 

To conclude, in this work we provide an analytical de¬ 
scription of the spatial Prisoner’s Dilemma, by using the 


framework of statistical physics, studying the particular 
case of agents provided with memory of their payoff (de¬ 
fined memory-aware agents). This condition entails that 
their payoff is not reset at each time step, so that they can 
increase it over time. In particular, we propose a model 
based on the kinetic theory of gases, showing how motion 
may lead a population towards an equilibrium far from the 
expected one (i.e., the Nash equilibrium). Remarkably, the 
final equilibrium depends on the system temperature, so 
that we have been able to identify a range of tempera¬ 
tures that triggers cooperation for all values of the payoff 
matrix (related to the PD). In addition, we found an in¬ 
teresting relation between the maximum temperature that 
foster cooperation and the size of the system. Notably, a 
scaling parameter in that relation has been computed by 
investigating different orders of magnitude of the size of 
the system. Furthermore, the dynamics of the resulting 
model have been also described in terms of order-disorder 
phase transitions. Finally, we deem that our results open 
the way to define a direct link between evolutionary game 
theory and statistical physics. 
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